Generating the Reduced Set by Systematic Sampling
نویسندگان
چکیده
The computational difficulties occurred when we use a conventional support vector machine with nonlinear kernels to deal with massive datasets. The reduced support vector machine (RSVM) replaces the fully dense square kernel matrix with a small rectangular kernel matrix which is used in the nonlinear SVM formulation to avoid the computational difficulties. In this paper, we propose a new algorithm, Systematic Sampling RSVM (SSRSVM) that selects the informative data points to form the reduced set while the RSVM used random selection scheme. This algorithm is inspired by the key idea of SVM, the SVM classifier can be represented by support vectors and the misclassified points are a part of support vectors. SSRSVM starts with an extremely small initial reduced set and adds a portion of misclassified points into the reduced set iteratively based on the current classifier until the validation set correctness is large enough. In our experiments, we tested SSRSVM on six public available datasets. It turns out that SSRSVM might automatically generate a smaller size of reduced set than the one by random sampling. Moreover, SSRSVM is faster than RSVM and much faster than conventional SVM under the same level of the test set correctness.
منابع مشابه
A Study on the Accuracy and Precision of Estimation of the Number, Basal Area and Standing Trees Volume per Hectare Using of some Sampling Methods in Forests of NavAsalem
The present study aimed to investigate the accuracy and precision estimation of the number, basal area and volume of the standing trees by methods of random and systematic random sampling in the forests of West Guilan. The cost or inventory time was determined using the criteria (E%2 × T). Inventory was carried out by complete sampling (census) in an area of 52 hectares. The study area (sect...
متن کاملHYBRID ARTIFICIAL NEURAL NETWORKS BASED ON ACO-RPROP FOR GENERATING MULTIPLE SPECTRUM-COMPATIBLE ARTIFICIAL EARTHQUAKE RECORDS FOR SPECIFIED SITE GEOLOGY
The main objective of this paper is to use ant optimized neural networks to generate artificial earthquake records. In this regard, training accelerograms selected according to the site geology of recorder station and Wavelet Packet Transform (WPT) used to decompose these records. Then Artificial Neural Networks (ANN) optimized with Ant Colony Optimization and resilient Backpropagation algorith...
متن کاملSystematic Sampling for Suspended Sediment
Because of high costs or complex logistics, scientific populations cannot be measured entirely and must be sampled. Accepted scientific practice holds that sample selection be based on statistical principles to assure objectivity when estimating totals and variances. Probability sampling--obtaining samples with known probabilities--is the only method that assures these results. However, probabi...
متن کاملGenerating Antithetic Random Variates in Simulation of a Replacement Process by Rejection Method
When the times between renewals in a renewal process are not exponentially distributed, simulation can become a viable method of analysis. The renewal function is estimated through simulation for a renewal process simulation for a renewal process with gamma distributed renewal times and the shape parameter a > 1. Gamma random deviates will be generated by means of the so called Acceptance Rejec...
متن کاملConformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database
Here, we present the algorithm and validation for OMEGA, a systematic, knowledge-based conformer generator. The algorithm consists of three phases: assembly of an initial 3D structure from a library of fragments; exhaustive enumeration of all rotatable torsions using values drawn from a knowledge-based list of angles, thereby generating a large set of conformations; and sampling of this set by ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004